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1.  INTRODUCTION 


Computer  simulation,  an  extremely  useful  and  popular  technique  in 
operations  research  and  management  science,  is  often  used  to  study  the 
behavior  of  very  complex  real-world  systems.  Unfortunately,  simulation 
models  of  complex  systems  tend  to  be  extremely  complicated  themselves 
because  of  the  meticulous  detail  ordinarily  Included  in  such  models.  Fur¬ 
thermore,  computer  codes  corresponding  to  these  models  are  usually  extraor¬ 
dinarily  large  and  very  long-running. 

Often,  simulation  users  cannot  readily  assimilate  the  information 
contained  in  large,  complex  codes  because  they  are  overwhelmed  by  the 
vast  number  of  factors  (l.e.,  input  variables)  and  are  confused  about  how 
to  make  an  effective  analysis  of  the  model  without  having  to  perform  an 
excessive  number  of  costly  and  time-consuming  simulation  runs.  If  the 
users  could  identify  the  most  Important  factors  in  some  reasonable  way, 
they  could  make  the  model  more  manageable  and  their  analysis  more  efficient 
by  concentrating  the  major  experimental  effort  on  the  key  factors. 

Factor  screening  methods  are  statistical  methods  that  attempt  to  ident¬ 
ify  the  more  Important  variables.  (See,  for  example,  [4],  [6],  and  [8].) 

A  basic  function  of  these  methods  is  to  sort  all  the  factors  into  two  pri¬ 
mary  groups.  One  group  consists  of  the  "Important"  factors  which  are  judged 
worthwhile  to  investigate  further,  while  the  other  consists  of  the  remaining 
"unimportant"  factors. 

When  selecting  an  appropriate  factor  screening  method,  one  must  pri¬ 
marily  consider  the  number  of  runs  available  for  screening.  In  the  simu¬ 
lation  environment,  the  number  of  factors  to  be  screened  almost  always  ex- 


ceeds  the  available  number  of  simulation  runs.  In  statistical  experimental 
design  terminology,  this  is  known  as  a  supersaturated  situation.  Such  a 
situation  is  common  in  the  simulation  framework  because  of  the  large  number 
of  factors  usually  under  consideration  and  because  of  the  time  and  cost  of 
the  computer  runs. 

Although  many  strategies  have  been  suggested  for  designing  and  con¬ 
ducting  screening  experiments,  few  are  applicable  to  the  supersaturated  case 
Furthermore,  for  those  few  that  are,  there  has  been  no  systematic  evaluation 
and  comparison  of  their  performance.  In  this  paper  we  provide  quantitative 
information  on  a  supersaturated  screening  strategy  based  on  random  balance 
sampling  ([1],  [lCg,  [13).  In  addition,  we  compare  this  strategy  with  a 
modified  strategy  based  on  a  combination  of  random  balance  and  Plackett- 
Burman  designs  [9]. 


2.  PRELIMINARY  DISCUSSION 


To  provide  a  common  statistical  basis  to  compare  and  assess  screen¬ 
ing  strategies ,  we  must  make  some  assumptions  as  to  the  general  structure 
of  a  simulation  model.  For  detecting  the  factors  having  major  effects  it 
is  usually  reasonable  to  assume 


K 


yi"60+  3xij  +  ei  * 


(2.1) 


,th 


where  is  the  value  of  the  response  (i.e.(  output  variable)  in  the  1 — 


simulation  run;  K  is  the  total  number  of  factors  to  be  screened,  each  of 


th 


which  is  at  two  levels  (±1) ;  -  ±1  depending  on  the  level  of  the  j — 


factor  during  the  i—  simulation  run;  6^  is  the  (linear)  effect  of  the  j — 


factor;  and  the  error  terms  are  independent  and  normally  distributed 


random  disturbances  with  zero  mean  and  variance  o  . 

In  essence,  model  (2.1)  is  a  first-order  Taylor  series  approximation 
to  an  actual  relationship  between  output  and  input  variables;  ordinarily 
we  would  use  this  approximation  over  a  relatively  small  region  of  the  factor 
space.  He  will  restrict  performance  evaluation  to  this  model. 


Random  Balance  Sampling 

An  experiment  involving  random  balance  sampling  is  based  on  an  experi¬ 
mental  design  that  is  random.  In  a  two-level  (±1)  random  balance  design, 
each  column  of  the  design  matrix  consists  of  N/2  +l's  and  N/2  -l's  where  N 
(an  even  number)  denotes  the  total  number  of  runs  to  be  made.  The  -(-l's  and 
-l's  in  each  column  are  assigned  randomly,  making  all  possible  combinations 
of  N/2  +l's  and  N/2  -l's  (there  are  *n  *H)  equally  likely,  with  each 
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column  receiving  an  independent  randomization. 

The  principal  advantage  of  random  balance  (RB)  sampling  for  use  in 
screening  is  its  flexibility.  We  can  select  N  independently  of  K;  there 
is  no  mathematical  restriction  or  relationship  between  N  and  K  as  there 
is  in  more  traditional  experimental  designs.  A  second  advantage  is  the 
ease  with  which  we  can  prepare  RB  designs  for  any  N  and  K,  an  important 
consideration  when  K  is  large. 

There  are  two  corresponding  disadvantages  to  RB  sampling.  The  first 
of  these  is  that  factors  are  confounded  to  a  random  degree.  Thus*  we  can¬ 
not  generally  control  the  amount  of  confounding  or  Interdependence  between 
factors.  Secondly,  there  is  no  specialized  or  unique  technique  for  analyz¬ 
ing  RB  designs.  The  simplest  approach  is  to  consider  each  factor  separately 
and  apply  a  standard  F-test.  We  should  mention,  however,  that  practically 
any  technique  used  to  analyze  data  without  RB  properties  can  be  used  to 
analyze  any  (sufficiently  small)  subset  of  factors  in  an  RB  design.  This 
is  done  by  simply  ignoring  any  factor  not  included  in  the  particular  set  of 
factors  being  analyzed. 

In  this  paper,  we  consider  a  standard  F-test  applied  separately  to  each 
factor  as  the  method  of  analysis  for  random  balance  data.  Furthermore,  for 
simplicity,  we  conduct  each  F-test  at  the  same  level  of  significance  a^.  An 
RB  strategy,  therefore,  is  completely  determined  once  we  specify  N  and  a ^ . 
Accordingly,  we  denote  such  a  strategy  by  RB(N,a^>.  Moreover,  we  classify 
a  factor  as  important  only  if  it  has  a  significant  F-ratio. 

Modified  Strategy 


We  now  consider  a  two-stage  screening  strategy  having  an  RB(N,o^) 


first-stage  followed  by  the  use  of  a  second-stage  Plackett-Burman  (PB) 
design.  He  include  a  given  factor  in  the  second-stage  PB  experiment  only 
if  it  has  a  significant  F-ratio  in  the  first-stage  RB  experiment.  In 
this  combination  strategy,  we  declare  important  those  factors  which  reach 
and  have  a  significant  effect  in  the  second  stage. 

Because  PB  designs  are  orthogonal,  the  second  stage  separates  any 
confounding  between  factors  carried  over  from  the  RB  first  stage.  Factors 
not  formally  Included  in  the  second-stage  experiment  are  held  at  a 
constant  level  so  not  to  bias  any  of  the  second-stage  estimates.  Further, 
unlike  RB  designs,  we  can  analyze  PB  designs  by  the  usual  analysis  of  vari¬ 
ance  procedures  for  factorial  experiments.  He  denote  this  combination 
strategy  by  RPCN.ot^.o^)  where  is  the  significance  level  used  in  all 
second-stage  F-tests. 

The  total  number  of  runs  R  required  by  an  RP  strategy  will  therefore 
be  N  +  M  where  M  denotes  the  number  of  second-stage  runs.  Although  we  can 
specify  N,  the  number  of  second-stage  runs  M  will  depend  on  the  number  of 
factors  S  carried  over  from  the  first  stage.  For  reasons  of  economy  and 
to  avoid  design  saturation  (i.e.,  no  degrees  of  freedom  to  estimate  experi¬ 
mental  error),  we  employ  the  smallest  PB  design  that  guarantees  at  least 
one  error  degree  of  freedom.  Since  PB  designs  are  only  available  for  num¬ 
bers  of  runs  that  are  multiples  of  four,  we  can  obtain  a  minimum  of  one  and 
a  maximum  of  four  error  degrees  of  freedom  by  following  this  convention. 

Accordingly,  we  can  write  M  mathematically  as  M *  B(S+1)  where 
B(x)  •  x+4-x(mod  4).  Thus,  R* N+M ■  N+B(S+1) .  He  should  emphasize,  however, 
that  R  is  random  (since  M  is).  Hence,  in  an  RPCN,^^)  strategy  we  do  not 
know  prior  to  experimentation  the  exact  number  of  runs  that  will  be  required. 
This,  of  course,  is  a  disadvantage  of  the  RP  strategy.  However,  noting  that 


B(x)  «  x  +  2.5  we  can  approximate  E(R)  by 

E(R)  =  N  +  E(S)  +  3.5  .  (2.2 

Since  |B(x)  -  (x  +  2.5) |  _<  1.5,  the  approximation  In  (2.2)  can  differ  from 
E(R)  by  at  most  1.5  runs.  In  Section  3  we  will  show  how  the  quantities 
N»a^»  and  affect  the  performance  of  the  RB  and  RP  screening  strategies 


3.  PERFORMANCE  ASSESSMENT 


In  general*  the  objectives  of  a  factor  screening  strategy  are  (1) 
to  detect. as  many  Important  factors  as  possible,  (2)  to  declare  important 
as  few  unimportant  factors  as  possible,  and  (3)  to  perform  as  few  runs  as 
possible.  In  short,  one  must  consider  both  how  many  runs  a  strategy  re¬ 
quires  and  how  accurately  it  classifies  factors.  It  is  difficult,  however, 
to  dichotomize  factors  as  either  important  or  unimportant.  From  a  practical 
standpoint,  the  importance  (or  unimportance)  of  a  factor  will  depend  on  the 
magnitude  of  its  effect  relative  to  that  of  experimental  error,  o,  and  that 
of  the  magnitudes  of  other  effects  present.  Importance,  therefore,  is  es¬ 
sentially  a  matter  of  degree.  The  greater  (lesser)  the  degree  of  importance, 
the  larger  should  be  the  probability  of  classifying  the  factor  as  important 
(unimportant) . 

In  this  section  we  provide  formulas  that  summarize  the  performance  of 
the  RB(N,a^)  and  RP^.a^.cij)  strategies  in  terms  of  the  number  (or  expected 
number)  of  runs  a  strategy  requires  and  in  terms  of  a  strategy's  sensitivity 
(i.e.,  power)  for  declaring  a  factor  Important.  In  order  to  compare  stra¬ 
tegies,  we  should  note  that  tradeoffs  will  need  to  be  made.  Indeed,  ob¬ 
jectives  (1)  and  (2),  which  deal  with  factor  classification,  conflict  with 
objective  (3),  which  deals  with  testing  cost.  In  many  ways  the  screening 
problem  is  like  the  testing  of  a  statistical  hypothesis  in  which  we  want 
the  sample  size  to  be  small  but  the  power  (i.e.,  the  probability  of  reject¬ 
ing  a  false  null  hypothesis)  to  be  large.  Our  intent  is  to  provide  the  sim¬ 
ulation  user  with  quantitative  it  formation  on  the  tradeoffs  involved. 

To  establish  some  nor  .tio.  ,  we  define  R^«l  if  the  J—  factor  is  de¬ 
clared  Important  by  an  RB(N,a^)  strategy,  and  we  define  Rjj"!  if  the  j— 


factor  is  declared  important  by  an  RP (N.a^.a^)  strategy;  otherwise,  we  let 
R^"0  and  Except  in  the  simplest  cases,  P(R^-l)  and  P(R2j*1)  are  too 

complex  to  be  evaluated  analytically.  In  lieu  of  exact  solutions,  we  develop 
approximations  to  these  probabilities.  In  addition,  we  present  an  approxima¬ 
tion  to  the  expected  value  of  R,  the  total  number  of  runs  required  by  an 
RP (N.a^,0^)  strategy.  The  total  number  of  runs  required  by  an  RB(N.a^)  stra¬ 
tegy  is,  of  course,  N  runs. 


The  RB  Strates 


We  can  write  model  (2.1)  in  matrix  terms  as  y-fJgl  +  XB  +  £.  where  1_ 

is  an  N  x  1  vector  of  +l's,  ^  is  an  N  x  1  vector  of  responses,  £  is  an  N 

x  1  vector  of  error  terms,  J3  is  a  K  x  1  vector  of  factor  effects,  and  X  is 

an  N  x  K  design  matrix.  In  a  random  balance  experiment  X»[x^,x2 . x^]  is 

fch 

a  stochastic  matrix  whose  j —  column,  is  an  N  x  1  vector  consisting  of 
a  random  arrangement  of  N/2  +l's  and  N/2  -l's.  By  construction,  the  K 
column  vectors  of  X  are  Independent.  We  assume  that  X  and  £  are  independent. 
The  simple  least  squares  estimator  of  Bj(j>l)  is  given  by 


6j  ■  <y+ry-j)/2 


(3.1) 


where  y+^(y_^)  *8  t*ie  average  value  of  the  response  over  the  N/2  runs  at 
the  +1  (-1)  level  of  the  factor.  (By  simple  least  squares  we  mean  that 

each  is  estimated  igorlng  all  other  factors.)  In  matrix  terms 


6^2^)+  Xj'£)/N  . 


(3.2) 


Thus, 


E(B.)-(1/N){  I  6.E(x,’x.)+E(x.'e)], 
J  i-1  1  1  "I  ~ 


(3.3) 


and 


(3.4) 


V(8.)  ■  (1/N2)  [  28.2v(x,'x,)+V(x»e)] 

J  i-i1  ~J  -i  ~J  ~ 


where  in  (3.4)  we  make  use  of  the  fact  that  *j 'g.»xj >xi*xj ,x2**,*»xjxk 
are  mutually  independent  for  fixed  j. 

/V 

It  is  clear  that  the  exact  sampling  distribution  of  8  j  is  intractable. 
However,  using  results  we  derive  in  the  Appendix,  we  can  easily  show 


2*2 
where  x  "18 
.  m 
m»l 


E($j)  “8j* 

V(8j)  -  (t2  -  8j)/(N  -  1)  +  cr2/N, 
cov^,^)  -8i8j/(N-l) 


(3.5) 


(3.6) 


(3.7) 


The  simple  least  squares  estimator  defined  in  (3.1)  is 


therefore  an  unbiased  estimator  of  8j»  although  its  variance  can  be  seriously 
inflated.  Moreover,  the  correlation  between  8^  and  8^  is  roughly 


corr(8jL,6j)  2  8^/4*^ 


(3.8) 


2  2  2  2 
where  $  ■  x  +  o  -  8  . 

m  in 


The  correlation  in  (3*8)  is  a  measure  of  the  confounding  between 

A 

and  8 j •  We  make  the  somewhat  surprising  observation  that  an  increase  in  N  does 
not  decrease  the  confounding  in  an  RB  design  where  simple  least  squares  is 
used  as  the  estimation  method.  Furthermore,  the  degree  of  confounding  between 

A  /V 

8 ^  and  8j  is  dependent  upon  o  and  the  magnitudes  of  the  other  effects  in  the 
model. 

Regarding  formal  significance  testing,  we  note  that  a  single-factor 

A 

F-test  to  test  whether  8j  is  significantly  different  from  zero  is  equiva¬ 
lent  to  a  simple  two-sample  t-test  between  the  high  (+1)  and  the  low  (-1) 
levels  of  the  factor.  The  associated  test  statistic  t^  is  given  by 


f  _  v. 


(3.9) 


tj-6j/[SSEj/N(N-2)],S 

where  SSE^  is  the  familiar  analysis  of  variance  notation  for  the  error  sum 
of  squares  of  factor  j.  Computationally , 

ssVhE(,i-V2+  £(yi-7-/  <3-10) 

where  the  first  (second)  summation  is  taken  over  the  N/2  observations  at  the 
high  (low)  level  of  the  j^-  factor.  An  alternative,  more  direct  computational 
formula  is  given  by 

N  2  J 

SSE  -  Ey*  -  Ngf  -  Ny  ,  (3.11) 

3  i-11  J 

where  y  is  the  overall  mean  of  the  N  responses. 

Following  normal  theory  we  reject  HQ:&j  in  favor  of  **0  if  the 

observed  value  of  |t^  |  equals  or  exceeds  t(N-2;  a^/2),  the  upper  100(1-01^/2) 
percentage  point  of  "Student's"  t -distribution  having  (N  -  2)  degrees  of  free¬ 
dom.  Assuming  that  normal  theory  is  adequate,  we  can  approximate  the  distri¬ 
bution  of  tj  with  a  noncentral  t-distribution  having  (N  -  2)  degrees  of  free¬ 
dom  and  noncentrality  parameter 

Slj-N1*^/^  .  (3.12) 

Therefore, 

P(R1J-l)s\p1(61J)  ,  (3.13) 

where  ^(6) -p{  |tn_2(6)  |  >t(N-2;ai/2)} 

and  T^( 6)  denotes  a  random  variable  having  a  noncentral  t-distributlon  with 
y  degrees  of  freedom  and  noncentrality  parameter  6. 
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The  RP  Strategy 


To  evaluate  P(R2^"1)  in  an  RP(NjOi^,a2)  strategy,  we  observe  first  that 


P(R2j  - 1)  -P(Rjj  -  1)  P(R2j  -  1  |  RXj  -  1) 
K 


(3.14) 


had  note  that  S  ■  ZR. . ,  where  as  defined  previously  S  denotes  the  number  of 
j-l1J 

factors  carried  over  from  the  RB(N.a^)  first  stage  to  the  PR  second  stage. 

Next,  we  define  -  S  -  R^  and  write 

K 

P(R  -llR,  -1)-  ZP(R,  -1,  S.  ■  8  -  1 1 R- ,  1) 


2j 


U 


u  K  04 

ml 


lj 


S"1 
K 

-  ZPtSj  -  s  -  l|Rjj  -  1)  P(R2j  -  1|  Sj  -  s  -  1,  Rjj-1) 
K 

-  IP(Sj  “  s  -  i|RXj  ■  i)p(R2j  ■  i|s  ■  8,  Ry  ■  1)  . 


(3.15) 


Using  well-known  testing  properties  of  PB  designs. 


P(R2j-l|S-s,  R^-l)  -^2(s,62J)  ,  (3.16) 

where  \|>2(s,6) -P{|T^gj(6)  |  £t(d(s);  a2/2)>  ,  (3.17) 

62j-[B(s  +  l)]%  Bj/o  ,  (3.18) 

and  d(s) -B(s  +  1)  -  (s  +  1)  .  (3.19) 


The  conditional  probability  in  (3.16)  represents  the  probability  that  the  J— 

factor  tests  significant  from  zero  in  the  PB  second  stage  given  that  it  and 

« 

s-l  other  factors  test  significant  from  zero  in  the  RB(N,a^)  first  stage. 

The  quantity  d(s)  represents  the  number  of  error  degrees  of  freedom  in  a  PB 
design  for  the  study  of  8  factors  in  B(s  +  1)  runs. 

Summarizing  up  to  this  point,  we  have 
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(3.20) 


P(R2j-l) -1^(6^)  ^2(s,62j)  P^-s-lfR^-l)  . 


To  complete  our  evaluation  of  P(R2j  - 1) ,  we  must  somehow  approximate  the  con¬ 
ditional  distribution  of  given  R^  "1  and  substitute  into  equation  (3.20). 
This,  however,  is  a  difficult  task;  the  conditional  distribution  of  given 
■  1  is  extremely  complex.  Nevertheless,  the  conditional  distribution  of 
Sj  given  Rjj  -1  might  be  reasonably  approximated,  for  moderately  sized  N,  as 
the  convolution  of  (K  - 1)  independent  Bernoulli  random  variables  having  suc¬ 
cess  probabilities  (  6^  :m>l,2,. ..,j  -  1,J  +  1,...,K}.  Alternatively,  fol¬ 
lowing  Feller  [3],  for  large  N  and  moderate  values  of  R 

we  might  reasonably  approximate  the  conditional  distribution  of  given 
Rjj  -  1  with  a  Poisson  distribution  having  mean  Xj .  The  Poisson  approximation 
approach  is  generally  much  easier  to  apply  than  the  Bernoulli  convolution  ap¬ 
proach,  particularly  if  K  is  large. 

Regarding  the  expected  number  of  runs  required  by  an  RP(N,a^,c*2)  stra¬ 
tegy,  we  observe  that 

K  K 

E(S)  ■  T  E(R1.)  s  &M6..)  .  (3.21) 

J-l  13  j-lA  13 

Whereupon,  introducing  (3.21)  into  (2.2),  we  obtain 


E(R)  2  N  +  3.5+  DM6..)  . 

j-lA  13 


(3.22) 


Monte  Carlo  Results 

As  a  check  on  the  various  approximations  presented  in  this  section,  we 
conducted  two  Monte  Carlo  case  studies,  the  results  of  which  are  summarized 
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In  Tables  1  and  2.  As  can  be  seen  from  these  tables,  the  results  are  ex¬ 
tremely  encourc^-ng  and  suggest  that  the  approximations  of  P(R^-l), 

P (Rj j  ■  1) »  and  E(R)  given  in  (3.13),  (3.20),  and  (3.22),  respectively,  are 
quite  reasonable  for  practical  purposes.  It  is  the  authors'  experience  [5] 
that  even  for  relatively  small  values  of  N  these  approximations  are  fairly 
reasonable. 

As  can  be  noted  from  Table  1,  the  approximations  to  P (R^j  ■  1)  based 
on  the  Bernoulli  convolution  and  Poisson  distribution  approaches  yield 
essentially  the  same  results.  Because  of  this  agreement  and  the  complexity 
of  the  calculations  associated  with  the  Bernoulli  convolution  approach  for 
the  second  case  study,  we  used  only  the  Poisson  distribution  approach  in 
the  second  example  to  evaluate  “1)  •  1®  this  case,  nonzero  factor 

effects  vary  In  magnitude,  ranging  between  a  and  60  in  absolute  magnitude. 
Moreover,  as  might  be  expected  in  an  actual  screening  situation,  the  majority 
of  the  effects  are  relatively  small. 
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4.  PRACTICAL  IMPLICATIONS  AND  CONSIDERATIONS 


In  this  section  we  discuss  s  nuaber  of  practical  considerations  and 
implications  regarding  the  RB  and  RP  screening  strategies.  In  addition, 
using  a  hypothetical  situation,  we  illustrate  numerically  a  direct  appli¬ 
cation  of  the  results  of  Section  3. 

We  note,  first,  that  an  increase  in  N  or  increases  the  power  of  an 
RB(N,aj)  strategy.  Of  course,  an  increase  in  N  also  increases  experimental 
costs  by  requiring  more  screening  runs,  and  an  increase  in  increases 
correspondingly  the  probability  of  declaring  important  a  negligible  factor. 

We  should  further  point  out  that  applying  an  F-test  separately  to  each 
factor  is  not  necessarily  the  most  powerful  method  of  analysing  data  from 
a  random  balance  experiment.  Presumably,  more  sophisticated  statistical 
techniques  (such  as  least  squares  stepwise  or  stagewlse  methods)  that  an¬ 
alyze  more  than  one  factor  at  a  time  would  provide  greater  power.  However, 
in  some  applications  such  methods  may  not  be  computationally  feasible,  par¬ 
ticularly  if  K  is  exceedingly  large,  and  can  be  severely  limited  if  N  is 
very  small  relative  to  K.  The  individual  F-test  approach  is  generally  a 
relatively  quick  and  easy  testing  procedure.  Moreover,  the  use  of 
this  approach  admits  to  a  tractable  quantitative  assessment,  whereas 
more  sophisticated  analytical  techniques  lead  to  an  Intractable  problem. 

In  any  event,  we  suspect  that  the  results  of  Section  3  can  be  regarded  as 
a  lower  bound  for  the  discriminatory  power  of  alternative,  more  sophisti¬ 
cated  RB  analysis  and  testing  procedures. 

To  increase  the  power  of  an  RPCN.cx^.c^)  strategy,  adjustments  can  be 

made  in  the  RB(N,otj)  first  stage,  the  PB  second  stage,  or  both.  Although 

an  increase  in  0^  does  not  affect  the  number  of  runs  in  an  RBCN.a^)  strategy, 
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this  is  not  true  for  an  RPCN.a^.c^)  strategy.  In  this  case,  an  increase 
in  a,  will  usually  increase  the  nuaber  of  second-stage  runs  and  hence  the 
total  nuaber  of  runs. 

To  increase  the  power  of  a  PB  experiment,  one  can  employ  a  larger  PB 
design  or  a  larger  level  of  significance  a2  in  the  analysis.  The  power  of 
the  second-stage  analysis  night  also  be  increased  if  some  factor  effects 
could  be  reasonably  assuned  to  be  negligible  based  on  an  exaalnat  ion  of  the  data. 
In  such  cases,  one  could  pool  the  sum  of  squares  associated  with  these  factors 
into  the  error  sum  of  squares,  thus  obtaining  a  pooled  error  estimate  having 
more  degrees  of  freedom  for  error  than  the  unpooled  estimate.  If  the  pooled 
effects  are  indeed  negligible,  the  increased  error  degrees  of  freedom  will 
translate  into  greater  power.  Pooling  is  especially  appealing  when  there 
is  only  one  degree  of  freedom  for  error.  Caution,  however,  should  be  exer¬ 
cised  since  pooling  tends  to  diminish  the  denominator  expected  mean  square 
of  the  F-ratio. 

To  help  determine  which  effects,  if  any,  one  might  reasonably  combine 
into  a  pooled  error  estimate,  estimated  effects  can  be  plotted  on  normal 
probability  paper.  In  this  technique  (e.g.,  Daniel  [2]),  negligible  effects 
should  fall  approximately  along  a  straight  line,  while  large  effects  should 
tend  to  fall  far  from  the  line.  Because  the  interpretation  of  the  results 
relies  heavily  on  subjective  judgement,  a  quantitative  assessment  of  the 
merits  of  normal  plotting  is  not  possible,  however. 

In  Section  2  we  noted  one  advantage  of  an  RB  strategy  compared  with 

an  RP  strategy.  This  was  that  for  an  RB  strategy,  unlike  for  an  RP  strategy, 

the  number  of  required  screening  runs  is  fixed  rather  than  random.  Other 

than  making  this  simple  observation  it  is  difficult  to  make  any  further 

general  statements  or  recommendations  regarding  the  usage  of  these  two 
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strategies  relative  to  each  other  without  further  study.  Toward  this  end, 
we  have  conducted  a  study  of  this  problem  under  some  simplified  assumptions. 

We  wish  to  consider  the  case  In  which  of  the  K  factors  to  be  screened 
k  have  the  same  absolute  effect,  say  A  >  0,  and  the  remaining  K-lc  are  Inactive, 
that  Is,  have  a  zero  effect.  In  this  case,  we  can  Identify  three  basic 
measures  of  performance:  c,  a,  and  ip,  where  we  define 

c  ■  (total  expected  number  of  screening  runs)/K  ,  (4.1) 

a  ■  P  {  declare  Important  an  Inactive  factor}  ,  (4.2) 

and  C declare  Important  an  active  factor}  .  (4.3) 

By  active,  we  mean  one  of  the  k  factors  having  an  absolute  effect  equal  A. 

For  future  reference,  we  define  p  -  k/K. 

It  is  clear  that  for  an  RB(N,otj)  strategy  caN/K,a«a^,  and  ip  can  be 
determined  from  (3.13).  For  an  RP^.c^.a^)  strategy  c  «  E(R)/K,a 
and  ip  and  E(R)  can  be  determined  from  (3.20)  and  (3.22),  respectively.  We 
note  that  specifying  c  and  a  determines  uniquely  the  corresponding  RB  stra¬ 
tegy,  namely,  RB(cK,a)  where  cK  is  assumed  to  be  an  even  integer.  With 
this  in  mind,  we  developed  a  computer  search  routine  which  for  any  given 
RB(ck,a)  strategy  attempts  to  find  an  RP  strategy  having  the  same  c  and 
a,  but  greater  power. 

Using  our  search  routine  we  examined  the  following  twelve  cases: 

K-100  ,  200  ,  500;  p».05,  .15;  and  A/os2,  8.  Furthermore,  to  maintain 

a  supersaturated  situation,  we  considered  .1  <c<  .9  . 

Figure  1  presents  graphically  the  results  of  the  Investigation.  Each 

curve  In  this  figure  represents  one  of  the  six  combinations  of  K  and  p 

studied.  For  points  (c,a)  above  each  curve,  no  RP(N,a^ .a^)  strategy  could 

improve  on  the  performance  of  the  corresponding  RB(cK,a)  strategy.  For 

points  (c,a)  below  each  curve,  there  exists  soma  RPtN.c^.a^)  strategy  that 
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outperforms  the  corresponding  RB(cKaa)  strategy.  Each  curve  defines  the 
boundaries  of  what  we  refer  to  as  "cones  of  inadmissibility."  As  can  be 
seen  from  Figure  1*  the  a  level  defining  each  boundary  curve  increases  as 
sample  sice  increases. 

Furthermore ,  for  the  cases  examined*  wa  found  that  r-A/o  had  virtually 

no  effect  on  the  results.  He  can  offer  two  reasons  for  this.  First*  in  an 

RB  experiment  where  all  nonzero  effects  are  of  equal  absolute  magnitude*  the 

noncentrality  parameter*  6^»  associated  with  these  factors  is  given  by 

dj^N/Gt-l  +  r*'2)*  per  (3.12).  If  r  is  large  relative  to  (k-l)”\  then 
2 

*  N/(lc -  1) .  Second*  in  a  PB  experiment  the  noncentrality  parameter  as¬ 
sociated  with  factors  of  equal  absolute  effects  is  given  by  62  -  M Sr  where 
M  is  the  number  of  runs  made*  see  (3.18) .  For  the  cases  we  considered*  M 
is  generally  large  enough  that  whether  r  -  2  or  r  -  8  makes  little  difference 
in  any  resulting  power  calculation. 

In  sum.  the  results  indicate  that  neither  strategy  dominates  the  other. 
Moreover,  the  findings  suggest  that  an  RF  strategy  should  be  considered  in 
those  situations  where  it  is  important  that  a  be  maintained  at  a  low  level. 

Finally*  to  illustrate  a  direct  application  of  the  results  of  Section  3* 
consider  a  simulation  model  consisting  of  !■  200  factors  and  suppose  we  are 
interested  in  employing  an  IB(N.a^)  screening  strategy.  The  parameters  N 
and  are  at  our  disposal*  of  course.  Suppose  further  that  we  anticipate 
the  average  absolute  effect  to  be  roughly  0.5o  with  a  standard  deviation  of 
1.5o  .  He  imagine  that  the  absolute  magnitudes  of  the  effects  have  a  rela¬ 
tive  frequency  distribution  similar  to  that  illustrated  in  Figure  2. 

It  is  convenient  to  define  Yj  ■  |8j |/o,  the  ratio  of  the  jth  absolute 

effect  to  o.  In  terms  of  the  Yj  can  write  the  square  of  the  noncentrality 

parameter  associated  with  the  jth  effect*  see  (3.12)*  as 
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(4.4) 


K  2 


"  HYJ/(1f1Yi  +  1-Yj)  * 


2  2 

From  our  above  assumptions,  Zyi/200  -  .5  and  Z(y1-.5)  /200«(1.5)  ,  from 
which  it  follows  that  EyJ ■  200(2.25+ .5^)  ■  500.  Therefore, 


6lj  "  7(501  . 


(4.5) 


Of  course  the  larger  is  6^^  ,  the  greater  Is  the  poser  of  the  testing  pro¬ 
cedure,  see  (3.13).  Further,  it  is  clear  that  if  |Bj|»  then 

hUh]- 


To  make  any  power  calculations,  we  need  to  specify  a  value  or  values  of 

Yj .  For  the  sake  of  illustration,  suppose  we  wish  to  consider  the  power  for 

an  effect  of  magnitude  3a,  that  is,  consider  a  Yj  ■  3.  Accordingly,  introduc- 

2 

lng  Yj  *3  into  (4.5),  we  have  6^  -  9N/492  .  Therefore,  for  a  given  N  and  c^, 
we  can  compute  the  power. 

Table  3  presents  the  power  corresponding  to  an  effect  of  magnitude  3o 
for  a  number  of  RB(N,a^)  strategiee.  (From  a  previous  remark,  we  know  that 
power  will  be  greater  for  any  effect  greater  than  30  in  absolute  magnitude.) 

He  can  use  Table  3  for  guidance  in  using  and  selecting  a  suitable  RB(N,a^) 
strategy.  For  example,  if  we  decide  to  control  at  .15  and  desire  our  power 
to  be  at  least  0.50  for  effects  greater  than  or  equal  3a  in  absolute  magnitude, 
then  N,  by  interpolation,  must  be  at  least  118  runs.  If  an  N  this  sise  is  not 
feasible,  then  sdditlonal  tradeoffs  must  be  made  or  we  should  Investigate  the 
use  of  an  alternative  screening  plan. 

The  use  of  an  RFOl,^^)  strategy  could  be  investigated  in  a  similar 
manner,  again  utilising  the  results  of  Section  3.  Here,  of  course,  the  ef¬ 
fect  of  three  psraneters,  not  two,  would  have  to  be  considered. 
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5.  SUMMARY  AND  DISCUSSION 


In  this  paper  we  have  focused  on  the  problem  of  factor  screening  (l.e.» 
the  Identification  of  the  important  variables)  In  computer  simulation.  Spe¬ 
cifically,  we  have  discussed  and  evaluated  two  factor  screening  strategies, 
both  of  which  are  based  on  random  balance  sampling.  These  two  strategies 
are  Intended  for  use  in  supersaturated  situations,  that  is,  where  the 
number  of  variables  to  be  screened  exceeds  the  number  of  runs  available. 

This  type  of  situation  frequently  exists  in  the  simulation  environment,  par¬ 
ticularly  in  the  study  of  large,  complex  models. 

Generally,  in  any  screening  application  we  must  consider  both  the  ac¬ 
curacy  of  factor  identification  and  the  nuaiber  of  runs  required  by  the  stra¬ 
tegy  we  employ.  For  the  two  strategies  we  have  considered,  we  have  developed 
approximations  to  (1)  the  probability  that  a  given  factor  is  declared  im¬ 
portant  and  (2)  the  total  number  (or  expected  number)  of  required  runs. 

These  approximations  compared  very  favorably  with  corresponding  Monte  Carlo 
estimates.  More  importantly,  the  approximations  provide  the  user  with 
quantitative  information  on  the  tradeoffs  involved  in  particular  screening 
applications.  Accordingly,  the  results  of  this  paper  can  be  used  as  a 
practical  guide  in  making  objective  decisions  about  the  use  of  the  two  stra¬ 
tegies  we  have  studied. 

Although  other  supersaturated  screening  strategies,  such  as  group  screen¬ 
ing  ([7],  (11]),  have  been  suggested,  there  has  been  little  or  no  systematic 
evaluation  of  the  performance  of  these  methods.  It  remains  to  be  seen,  there¬ 
fore,  how  the  performance  of  these  methods  compare  with  the  performance  of  the 
two  strategies  examined  here. 
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If  H  la  a  dlacrata  random  varlabla  having  probability  dlatrlbutlon 


P(H-h)  ■ 


(A.l) 


h“0»  1|  2|  ...,  r 
otherwise, 

where  r  la  any  poaltlve  Integer,  we  write  H~H(r).  The  claaa  of  diatrl- 
butlone  defined  In  (A.l)  la  a  synmetrlc  aub family  of  the  hypergeoaetrlc 
family  of  dlatrlbutlona.  If  H~H(r),  then  E(H)«r/2  and  V(H)  ■  (r/2)2/(2r-l). 

We  define  ■  (x^ + 1) /2  where  la  the  1—  column  vector  of  an  RB  de- 
aign  matrix  and  note  that,  for  1^1,  f, 'f, ~ B(N/2) .  It  follows  that,  for 
l^j,  (Ej'Xj+N)/* -H(N/2)  .  Bence,  for  l^J,  and 

In  regard  to  the  distribution  of  x^ we  observe  that  Xj'llSj** 

2 

N(0,No  )  since  x^ 'x^  -  N.  Because  the  conditional  distribution  of  x^ 'c_ 

given  Xj  is  the  same  for  any  realisation  of  x^ ,  the  result  Is  therefore 

2 

true  unconditionally.  Thus,  E(x^  *£)  * 0  and  V(x_j  '£)  »Bo  . 

To  find  the  covariance  between  ^  and  8^,  for  l^j,  we  have 

cov(8ltBj>  *1(8^)  -  . 

K 

We  write  N8  ■  y  ♦  X  where  X  -x  *e  and  y  ■  EB_x*_x  .  Thus, 

■  f  ■  ®  ^  ■  Q 

q-i 

E('flYj)+E(yiXJ)+E(YjXi)  ♦  EtX^). 

It  Is  easy  to  show  that  EtY^Xj)  -EfrjX^)  -ECX^Xj)  »0,  so  that 


(A. 2) 


(A.  3) 


-•i 


:7l 


cov(81.8j)  •  ECYjYj)/*2  -  S1Bj 


(A. 4) 
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Expanding  E(y_y  ),  w  have 


1<Vj)’tSl,5lBAE^'.2rHlV  •  «'5> 

It  is  not  difficult  to  varlfy  that,  for  ,  Efx'xx'x  )  la  zero  unless 

*  —*“3  ~ <1 

r-i,  q-j  or  r-J,  q»l.  When  r-1  and  q»J,  E^’x^Xj  *x^)  ■  E[(x^)21  "H2: 
when  r-j  and  q-1,  ’x^)  -  V^’x^  -^/(N-  1)  . 

Substitution  Into  (A.5)  yields 

B(YlYJ)-H3Bi3J/(H-l).  (A. 6) 

Finally,  Introducing  (A.6)  Into  (A.4),  we  get  the  desired  result 

covd^)  -  0tBj/(»-l)  .  (A.  7) 


: 


r 

i 


r 


r4 


.  i 
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Table  2:  Results  of  Case  Study  II.  (Value  in  parentheses  represents  estimated 
standard  deviation  of  Monte  Carlo  estimate.) 
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